Statistical Significance
Lecture 8
Results from survey--thank you!!
- Lecture videos (from the summer) on ELMS
- Last 2 homework deadlines are now on Sunday instead of
Friday
- Final exam topics will be more spread out
- Fun Challenge Problems
- Will add more as I think of them
- Reminder: answers to in-class exercises on lectures slides
and/or in lecture code (both posted to ELMS)
Review
- plot(x = ____, y = _____)
- plot(x = ____, y = _____, main = “__”, xlab = “__”, ylab = “__”)
- lm(y ~ x)
- summary(lm(y ~ x))
- abline(linear_model)
p-value
We saw the calling summary() on our linear model outputted p-
values
Statistical Significance
- Random variable: assigns real number to outcome of
experiment
- Ex: heads or tails of a coin flip, hours of sleep
gained/lost, sample mean
- We don’t know what the value will be, but we can estimate
the probability the random variable takes on certain values
Image: https://www.khanacademy.org/math/ap-statistics/probability-ap/randomness-probability-
simulation/a/theoretical-and-experimental-probability-coin-flips-and-die-rolls
If we assume probability of 1 head = 0.5,
That means:
Probability of 2 heads = 0.5 * 0.5
Out of 100 flips, probability = 100C2 * 0.5
2
* 0.5
98
Probability of 3 heads = 0.5 * 0.5 * 0.5
Probability of 100 heads = 0.5
100
Coin Example
Statistical Significance: Sleep
- Sleep dataset: impossible to have the entire population try
this drug to see if sleep increases or not → only 20 students
were sampled
- 10 placed in control group (no drug/placebo)
- 10 placed in experimental group (drug)
- How do we know if finding a certain result for our sample
would be the same for the general population?
- We don’t know for sure, but we can estimate the probability
of getting our sample result
Statistical Significance: hypotheses
- Null hypothesis (H0): some initial claim about population
being studied
- Alternative hypothesis (Ha): the claim that we would accept
if the null hypothesis was false
- We start off assuming H0 as true
- Reject H0 if probability of getting our sample result is very
small
Image: https://www.khanacademy.org/math/ap-statistics/probability-ap/randomness-probability-
simulation/a/theoretical-and-experimental-probability-coin-flips-and-die-rolls
If we assume probability of 1 head = 0.5,
That means:
Probability of 2 heads = 0.5 * 0.5
Probability of 3 heads = 0.5 * 0.5 * 0.5
Probability of 100 heads = 0.5
100
H0: p = 0.5
If we actually got 100 heads, probability is
so small that we should reject H0
Coin Example
Modeling sample means (sleep) is not
as simple as coin flips--we’ll use a t-
distribution instead of a binomial one
Statistical Significance: t-test
- Uses t-distribution to determine if difference between
sample mean and another value is statistically significant
- Sleep dataset: determine if experimental drug affects
amount of sleep
- Mean we want to analyze: sleep hour difference
(gained/lost)
Let’s see an example before getting
the formal definition of a p-value
One-sample t-test, two-sided
- Null hypothesis: mean of extra = 0
- Alternative hypothesis: mean of extra is not equal to 0
- Function name: t.test()
- x = vector of interest
- alternative = "two.sided" means we
are using a two-sided alternative
hypothesis
- We don't care which direction the
mean of sleep$extra is different
from 0 (mean of sleep$extra
could be greater than 0 or less
than 0)
One-sample t-test, two-sided
What is a p-value?
p-value: probability of obtaining our result or an even
more extreme result, assuming the null hypothesis is
true
For sleep, null hypothesis: mean of extra = 0
What is a p-value?
- Mean of sleep$extra = 1.54 hours
- Is this due to random chance or is the true mean actually
different from 0?
- p-value here is probability of selecting a sample whose mean
of sleep$extra is >=1.54 or <=1.54 hours, if the true mean for
the entire infinite population was 0 hours
What is a p-value?
- p-values should be < 0.05 to be statistically significant
- AKA there should be a less than a 5% chance of your result
(or something more extreme) happening by chance if the
null hypothesis is true
- 0.05 is an arbitrary value that is generally accepted to be
the threshold
- We'll use this threshold in our class because it is pretty
standard
What is a p-value?
- Our p-value = 0.002918, which is <0.05, meaning our result is
statistically significant
- Only a 0.2% chance of getting ±1.54 hours or more
extreme, if the true mean was 0 hours
- We get to reject null hypothesis that true mean = 0 hours
(meaning the drug affects sleep)
- Let's take a step back and consider what we are interested in
- We want to know if number of hours of sleep increased (we
don't want a drug that decreases sleep when students already
get so little sleep)
- Should we be using a two-sided alternative hypothesis?
Different t-tests
One-sample t-test, one-sided
- Null hypothesis: mean of extra = 0
- Alternative hypothesis: mean of extra is > 0
- We only care about if mean sleep
hour difference has increased
(gained sleep)
- Alternative = “greater” because
alternative hypothesis is that mean is
> 0
- If interested in other tail,
use alternative = “less”
- Have you noticed anything wrong with our previous analysis?
- We were previously looking at the mean of sleep$extra for the
entire dataset
- Our data clearly contains two different types of participants!
- 10 people in a control group, 10 in an experimental group
- Our previous analysis just found that overall, people
experienced an increase in sleep, but didn't tell us if those
people were in the experimental group or not
Different t-tests
- We still compare means, but we need to separate the
participants in the control (group = 1) and experimental (group
= 2) groups
Two-sample t-test
Two-sample t-test, one-tailed
- Null hypothesis: mean of extra in group 1 = mean of extra in group 2
- Alternative hypothesis: mean of extra in group 1 < mean of extra in
group 2
- Some extra arguments:
- x = sample 1 data
- y = sample 2 data
- var.equal = TRUE to treat the
variances of each sample as
the same.
- Order of arguments matters
- If alternative is sample 1 < sample 2, alternative = “less”
- If alternative is sample 1 > sample 2, alternative = “greater”
- We can flip order of experimental/control and then switch
alternative:
Two-sample t-test, one-tailed
Two-sample t-test, alternative syntax
- Use ~ similar to what we did in regression
- Saves us the work of having to create subsets with our data
Be careful with the order of
arguments
In this case, group = 1 for control
and = 2 for experimental.
Because the number for control
is lower, treat it as if it came first
in the arguments, so we use
alternative = "less."
- p-value for difference in sample means is = 0.03959
- 0.03959 < 0.05 → we can reject the null hypothesis
- It is likely that the difference in sleep between the control and
experimental group is not 0
- True mean sleep difference for students in the experimental
group is likely greater than that of control group
Interpretation of test results
- There are many ways to design the experiment to collect sleep
data. Imagine the following two cases:
Paired-sample t-test, one-tailed
20 students total:
- Randomly assign 10 to the control
group and 10 to the experimental
- Control group gets a placebo while
the experimental group gets a
drug
- Sleep is measured before and
after to determine sleep
gained/lost.
10 students total: paired design
- Measure sleep difference
between day 1 and 2 with no
drug consumed
- Measure sleep difference
between day 3 and 4 after
consuming drug
Paired-sample t-test, one-tailed
- Same function, additional argument paired = TRUE to specify
that the data is paired
Summary of t-tests
- One-sample t-test: t.test(x = sample, alternative = "two.sided"
or “greater” or “less”)
- Two-sample t-test: t.test(x = sample1, y = sample2, alternative
= "two.sided" or “greater” or “less”, var.equal = TRUE or
FALSE)
- Paired t-test: t.test(x = sample1, y = sample2, alternative =
"two.sided" or “greateror “less”, paired = TRUE, var.equal =
TRUE or FALSE)
Your Turn: heart.csv
- Is there a difference in cholesterol level between females and
males?
- Optional: use heart_missing.csv and only exclude those with NA for sex and chol (don’t
use na.omit())
- Optional: stratify by smoking status
- Challenge (optional): think about how you would determine if there is a difference in sex
distribution (number of males vs females) for people with cholsterol < 240 vs cholsterol
>= 240?
Hint:
- .test(x = sample, alternative = "two.sided" or “greater” or “less”)
- t.test(x = sample1, y = sample2, alternative = "two.sided" or “greater” or “less”, var.equal
= TRUE or FALSE)
- t.test(x = sample1, y = sample2, alternative = "two.sided" or “greater” or “less”, paired =
TRUE, var.equal = TRUE or FALSE)
Heart cholesterol level t test (one possible
solution)
- You might have noticed that all these test results came with a
95% confidence interval
- Confidence interval: alternative to p-value for statistical
significance
Confidence Intervals
- (-34.47535, -0.8669947): means the true mean difference in
cholesterol between males and females is between 34.47535
and 9.44719, with 95% confidence
Confidence Intervals
- Rejecting a null hypothesis does not prove anything
- Statistical inference is based on inference
- Still a small probability that we could have gotten certain
results based on chance
- Important to re-run experiments with larger sample sizes to
validate results.
Final Note on Significance
- Like with regression, other tests exist, but how we use them in
R follows a similar pattern
- Other functions for statistical testing:
- chisq.test() chi-squared test
- aov() analysis of variance
- wilcox.test() Wilcoxon Signed Rank Test
- fisher.test() Fisher’s exact test
- Many many MANY more
Other statistical tests